Improving Verb Phrase Extraction by Targeting Phrasal Verbs based on Valency Frames
ثبت نشده
چکیده
In the Gender and Work project (GaW), historians are building a database with information on what men and women did for a living in the Early Modern Swedish society, i.e. approximately 1550–1800 [1]. This information is currently extracted by researchers manually going through large volumes of court records and church documents, searching for relevant text passages describing working activities. In this process, it has been noticed that working activities often are described in the form of verb phrases, such as hugga ved (”chop wood”), sälja fisk (”sell fish”) or tjäna som piga (”serve as a maid”). In previous work I have, in cooperation with the historians, developed a method for automatically extracting verb phrases from historical documents, by use of spelling normalisation, tagging and parsing [4]. With this approach, most of the verbs in historical Swedish texts are correctly extracted. Due to differences in word order and significantly longer sentences than in present-day Swedish texts, it is however still hard for the parser to correctly identify the precise complements associated with a verb in a sentence. The aim of my proposed project is to improve the verb phrase extraction results by providing verb valency information in the extraction step, with a special focus on phrasal verbs. In this project, I refer to phrasal verbs in its broader sense, including particle phrasal verbs such as äga rum (”take place”), prepositional phrasal verbs such as handla med (”trade with”), and the combination of these two as in betala igen till (”pay back to”).
منابع مشابه
Improving Verb Phrase Extraction from Historical Text by use of Verb Valency Frames
In this paper we explore the idea of using verb valency information to improve verb phrase extraction from historical text. As a case study, we perform experiments on Early Modern Swedish data, but the approach could easily be transferred to other languages and/or time periods as well. We show that by using verb valency information in a post-processing step to the verb phrase extraction system,...
متن کاملاستفاده از تجزیه گرهای احتمالاتی زبان طبیعی جهت بهبود ترجمه افعال گروهی انگلیسی به فارسی
Machine translation of English sentences faces a big problem when it deals with phrasal verbs. Phrasal verb is a common structure occurring in English as a combination of a verb and a preposition, a verb and an adverb, or a verb with both an adverb and a preposition. Meaning of a phrasal verb is not compositional. The second part of the phrasal verbs which often is a preposition is called parti...
متن کاملImproving English-Bulgarian Statistical Machine Translation by Phrasal Verb Treatment
This work describes an experimental evaluation of the significance of phrasal verb treatment for obtaining better quality statistical machine translation (SMT) results. Phrasal verbs are multiword expressions used frequently in English, independent of the domain and degree of formality of language. They are challenging for natural language processing due to their idiosyncratic semantic and synt...
متن کاملTowards Automatic Extraction of Verb Frames
This article explores the possibilities of automatic extraction of both surface and valency frames of Czech verbs. First, it is clearly documented that the data from Prague Dependency Treebank is not sufficient for collecting enough examples of verb frames to build a large scale lexicon. As a solution, an approach to pick nice examples of sentences from any texts is suggested and thoroughly des...
متن کاملThe Comparative Effect of Visual vs. Auditory Input Enhancement on Learning Non-Congruent Phrasal Verbs by Iranian EFL Learners
Vocabulary is one of the essential components of language and learning phrasal verbs as part of vocabulary is quite challenging for foreign language learners. The present study aimed at investigating the effects of visual and auditory input enhancement on learning non-congruent phrasal verbs. The participants of the study were 90 intermediate English language learners who were divided into two ...
متن کامل